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object. 
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SYSTEM AND METHOD OF DYNAMICALLY 

GENERATING AN ELECTRONIC 
DOCUMENT BASED UPON DATA ANALYSIS 

PRIORITY CLAIM 

The benefit under 35 U.S.C. § 119(e) of the following 
U.S. provisional application is hereby claimed and incorpo- 
rated by reference, in its entirety: 
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This application is related to U.S. application Ser. No.: 
09/456,778, entitled "A SYSTEM AND METHOD OF 
OBFUSCATING DATA"; U.S. application Ser. No.:09/456, 25 
778, entitled "A SYSTEM AND METHOD OF DYNAMI- 
CALLY GENERATING INDEX INFORMATION FOR A 
DATA OBJECT BASED UPON CLIENT PROVIDED 
SEARCH WORDS"; U.S. application Ser. No.: 09/456,600, 
entitled "A SYSTEM AND METHOD OF DYNAMI- 30 
CALLY CUSTOMIZING THE CONTENT OF A NET- 
WORK ACCESSIBLE ELECTRONIC RESOURCE 
BASED UPON THE IDENTITY OF THE REQUESTOR"; 
U.S. application Ser. No.: 09/456,793, entitled "ASYSTEM 
AND METHOD OF DYNAMICALLY GENERATING 
INDEX INFORMATION"; U.S. application Ser. No.: 
09/45600, entitled "A SYSTEM AND METHOD OF PRO- 
VIDING MULTIPLE ITEMS OF INDEX INFORMATION 
FOR A SINGLE DATA OBJECT", which are being filed 
concurrently herewith on Dec. 8, 1999. 40 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The field of the invention relates to information retrieval 
systems. More particularly, the field of the inventions relates 45 
to generating index information for data objects. 

2. Description of the Related Technology 
Information retrieval (IR) systems index documents by 

searching for keywords that are contained within the docu- 50 
ments. Typically, the searches are not performed on the 
documents themselves. Instead, words are extracted from 
the document and are then indexed in separate data struc- 
tures optimized for searching. 

However, secure documents, such as documents that are 55 
protected by digital rights management (DRM) software, 
present a special problem for IR systems. Traditionally, IR 
systems rely upon having full access to the contents of the 
document to prepare the index information for the docu- 
ment. For example, IR systems that index HyperText 6 q 
Markup Language (HTML) documents on the Internet typi- 
cally open the HTML documents via its Uniform Resource 
Locator (URL), then download, parse, and index the entire 
document. 

Secure software, however, does not permit this kind of 65 
unrestricted access. Access is restricted to those applications 
that are both authorized and trusted by the secure software. 



For security concerns, all other applications are prevented 
from accessing the protected document. 

One way to solve this problem is to retrofit all pre-existing 
IR systems so that they are "rights enabled." This solution 
permits IR systems to communicate directly with secure 
software to obtain the document source. However, this 
approach makes a number of unrealistic assumptions, 
including: (i) that it is possible to retrofit legacy IR systems 
such that they would comply with the secure software's 
security requirements; (ii) that all secure system providers 
would be willing or able to make the necessary changes in 
a timely manner; and (iii) that it is possible to establish the 
necessary trust relationships between every secure provider, 
copyright holder, and IR system provider. This approach has 
attendant flaws and there is a need for a better solution. 

Another problem with preparing index information for IR 
systems is that each IR system has different indexing algo- 
rithms for organizing and storing information. IR systems 
often analyze the header of the electronic document when 
selecting the index information for the electronic document. 
The header includes meta-in formation regarding the content 
of document. However, not all of the IR systems retrieve the 
same keywords from the electronic document when select- 
ing the index information. For example, some IR systems 
remove duplicative words from the metatag information, 
while others do not. Furthermore, for example, some IR 
system recognize phrases, while others do not. Accordingly, 
it is difficult to customize index information that is ideally 
suited for use with more than one IR system. 

Thus, there is a need for a system for providing index 
information to IR systems. The system should be able to 
provide information to the IR systems that is almost as 
usable as the original. Preferably, the system should not 
require the modification of any legacy IR systems. 
Furthermore, it should be difficult to reconstruct the original 
document source (or any reasonable facsimile thereof) from 
the provided index information. Furthermore, the system 
should be able to automatically customize the index infor- 
mation regarding an electronic document, on an IR system- 
by-IR system basis. 

SUMMARY OF THE INVENTION 

In one embodiment of the invention, a method of gener- 
ating index information for a data object, the method com- 
prising generating index information for the data object, 
wherein the index information includes a set of one or more 
keywords, selecting one or more of the keywords from the 
index information, identifying one or more words that are 
associated with the selected keywords, and adding the 
identified words to the set of keywords, the identified words 
providing additional keywords for the index information for 
the data object. 

In yet another embodiment of the invention, a method of 
generating index information for a data object, the method 
comprising generating index information for the data object, 
identifying one or more words that are common to a group 
of data objects that includes the data object, and adding the 
identified words to the index information. 

In yet another embodiment of the invention, a method of 
generating index information for a data object, the method 
comprising generating index information for the data object, 
wherein the index information includes a set of one or more 
keywords, identifying the roots of selected keywords, sub- 
stituting the selected keywords with the roots. 

In yet another embodiment of the invention, a method of 
generating index information for a data object, the method 
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comprising generating index information for the data object, DETAILED DESCRIPTION OF EMBODIMENTS 

wherein the index information includes a set of one or more OF THE INVENTION 

keywords, classifying one or more of the keywords into one ^r,, . j . * < . ... 

' , ■/ 0 , at i . _ r The following detailed description is directed to certain 

or more classifications, selecting at least one or the . . • TT L • 

classifications, and removing one or more of the keywords 5 f.P ec,flc embodiments .of the invention. However, the inven- 

that are members of a selected classification of the key- ' ,0 fi n ca ° b ° embod ** ,n h a ™ ultl,ude of dlfferenl wa y s as 

wor(ls denned and covered by the claims. 

In yet another embodiment of the invention, a method of SYSTEM OVERVIEW 
generating index information for a data object, the method 

comprising generating index information for a data object, 10 Referring to FIG. 1, an exemplary network configuration 

wherein the index information comprises one or more 100 wiu be described. A user 102 communicates with a 

keywords, selecting one or more of the keywords, identify- computing environment which may include multiple server 

ing one or more keywords that are associated with the computers 108 or single server computer 110 in a client/ 

selected keywords with a semantic network, and adding the servcr relationship on a computer network 116. In a client/ 

identified keywords to the index information. 15 server environment, each of the server computers 108, 110 

includes a server program which communicates with a client 

BRIEF DESCRIPTION OF THE DRAWINGS computer 115. 

FIG. 1 is a block diagram illustrating one network con- The server computers 108, 110, and the client computer 

figuration that comprises a client computer and a server 115 may each have any conventional general purpose single - 

computer that are connected via a network. 20 0 r multi-chip microprocessor such as a Pentium® processor, 

FIG. 2 is a data flow diagram illustrating in further detail a Pentium® Pro processor, a 8051 processor, a MIPS® 
the communication between the client computer and the processor, a Power PC®, processor, or an ALPHA® pro- 
server computer of FIG. 1. cessor. In addition, the microprocessor may be any conven- 

F1G. 3 is a block diagram illustrating in further detail the tional P ur P 0SC microprocessor such as a digital 

software components of the server computer of FIG. 2. 25 si S nal processor or a graphics processor. Furthermore, the 

FIG. 4 is a block diagram illustrating the components of ^rver computers 108, 110 and the client computer 115 may 

j . i *l . • • * • j u *u * <■ be desktop, server, portable, hand-held, set-top, or any other 

a user database that is maintained by the server computer of ^ JL 

P IG j j r desired type of configuration. Furthermore, the server com- 

n . puters 108, 110 and the client computer 115 each may be 

FIG. 5 is a top level flowchart illustrating a process for 30 used ^ with various operat i n g systems such as: 

preparing a response to a request for an electronic resource UMX> UNUX; Djsk Operating System (DOS), VxWorks, 

that is maintained by the server computer of FIG. 1. palmOS) os/2> windows 3 x> Windows 95) windows 98, 

FIGS. 6 and 7 are collectively a flowchart illustrating in anc j Windows NT. 

further detail the states of FIG. 5 whereby the server Jhe CQm m m ^ ^ ^ uter 

computer prepares a response to the request for the elec- 35 U5 each jndade a ne[work terminal ui ^ with a 

video display, keyboard and pointing device. In one embodi- 

FIG. 8 is a block diagram illustrating one of the data ment 0 f network configuration 100, the client computer 115 

objects shown in FIG. 2 being partitioned into multiple inc i ude s a network browser 120 that is used to access the 

sections, each of the sections comprising a chapter in a book. server com puter 110. In one embodiment of the invention, 

FIG. 9 is a representational block diagram illustrating an 40 i ne network browser 120 is the Internet Explorer, licensed by 

exemplary screen display that is transmitted to the client Microsoft Inc. of Redmond, Wash. 

computer (FIG. 1) from the server computer (FIG. 1) in ^ user 102 at the U5 may utilize the browser 

response to a request for an electronic resource from the 12fJ {Q remotely access the ^Tvcr Drogram ^ a keyboard 

client computer. and/or pointing device and a visual display, such as a 

FIG. 10 is a flowchart illustrating an obfuscation process monitor 118. It is noted that although only one client 

that is performed by the server computer of FIG. 2 with computer 115 is shown in FIG. 1, the network configuration 

respect to index information that is associated with one of 100 can include hundreds of thousands of client computers 

the data objects of FIG. 2. an d aipw^ds. 

FIGS. 11 and 12 are collectively a flowchart illustrating in 5Q The net work 116 may include any type of electronically 

further detail a process for dynamically preparing the index connected group of computers including, for instance, the 

information for an electronic document in response to a following networks: a virtual private network, a public 

request for a network resource. Internet, a private Internet, a secure Internet, a private 

FIG. 13 is a block diagram illustrating the contents of an network, a public network, a value-added network, an 

exemplary data object of FIG. 2. 55 intranet, and the like. In addition, the connectivity to the 

FIG. 14 is a block diagram illustrating a set of index network may be, for example, remote modem, Ethernet 

information that is based upon the exemplary data object (IEEE 802.3), Token Ring (IEEE 802.5), Fiber Distributed 

shown in FIG. 13. Datalink Interface (FDDI) or Asynchronous Transfer Mode 

FIG. 15 is a block diagram illustrating the state of the (ATM). The network 116 may connect to the client computer 

index information of FIG. 14 subsequent to one or more 60 115, for example, by use of a modem or by use of a network 

reserved words being added to the index information. interface card that resides in the client computer 115. 

FIG. 16 is a block diagram illustrating the state of the The server computers 108 may be connected via a wide 

index information of FIG. 15 subsequent to the index area network 106 to a network gateway 104, which provides 

information being randomized. access to the wide area network 106 via a high-speed, 

FIG. 17 is a block diagram illustrating an exemplary 65 dedicated data circuit, 

electronic document that is created by the server computer Devices, other than the hardware configurations described 

of FIG. 1 for transmission to the client computer of FIG, 1. above, may be used to communicate with the server com- 
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puters 108, 110. If the server computers 108, 110 are the electronic resources that are stored by the computers 

equipped with voice recognition or DTMF hardware, the connected to the network 116, such as the server computer 

user 102 can communicate with the server programs by use 110. Electronic resources can comprise prepared electronic 

of a telephone 124. Other connection devices for commu- documents, or, alternatively, dynamically prepared elec- 

nicating with the server computers 108, 110 include a s lr0 m C documents which are the output of scripts of the 

portable personal computer 126 with a modem or wireless scrvcr computer 110. In one embodiment, the spiders are 

connection interface a cable interface device 128 connected programmed to visit a server that has been identified by a 

to a visual display 130, or a satellite dish 132 connected to SCfVCr administrator ^ bcing ncw or dated . ^ spidcr 

a satellite receiver 134 and a television 136. For convenience fo]lows aU of the h xt lmkfi in ^ of ^ electronic 

of description, each of the above hardware configurations „ , c * L **i u *i_ i * • j 

• i j j .i. j c c*t_ v . . io documents of the server until all the electronic documents 
are included within the definition of the client computer 115. . . j * ■ j • , i 
Other ways of allowing communication between the user have be * n \ ca± ^ in , dexin g P ro P™ < Dot shown) reads the 
102 and the server computers 108, 110 are envisioned. surveyed electronic documents and creates an index data- 
Further, it is noted the server computers 108, 110 and the b f e based ° n the words <>™tamed * each of the surveyed 
client computer 115, may not necessarily be located in the electronic documents. In another embodiment of the 
same room, building or complex. In fact, the server com- 35 invention, the server computer 110 provides a list of elec- 
puters 108, 110 and the client computer 115 could each be tromc documents in the server computer 110 that should be 
located in different states or countries. indexed by the IR system. 

FIG. 2 is a block diagram illustrating in further detail In one embodiment, the server computer 110 knows the 

selected aspects of FIG. 1. FIG. 2 illustrates the com muni- indexing characteristics of the IR systems 208A-208M. In 

cation between the communication between the client com- 20 response to a request for a selected electronic resource, e.g., 

puter 115, a plurality of information retrieval ("IR") systems an electronic document, the server computer 110 dynami- 

208A-208M, and the server computers 108, 110. Each of the cally generates an electronic document that comprises the 

IR systems 208A-208M may be embodied in any of the index information for the source data object that is associ- 

hardware configurations set forth above with respect to the ate d with the request. As defined herein, the term "dynami- 

server computer 110 or the client computer 115. FIG. 2 25 ca n y generates" comprises either (i) preparing in real-time 

illustrates that the client computer 115 is connected to the an electronic document or (ii) transmitting a pre-prepared 

server 110 and the plurality of IR systems 208A-208M via electron i c document that is associated with the URL and that 

the network 116 It is noted that although only three IR ^ customized particularly for a se i ec ted requestor, 
systems 208A-208M are shown in FIG. 2, the client com- 

puter 115 and the server computer 110 can be connected to 30 ln customizing the index information, the server computer 

a large number, e.g., hundreds or more, of IR systems. For 110 atte mP ts to maximize the odds that a user will find the 

convenience of description, the remainder of the discussion index information for the source data object within the IR 

will refer only to the server computer 110 when referring to svslem ' ^ index information for the source data object 

the server computers 108, 110. However, it is to be appre- ma y optionally be obfuscated such that the index informa- 

ciated that the description of the operation of server com- 35 tion ma y not be rcadil y used for Purposes other than index- 

puter 110, equally applies to the operation of the server in S- Furthermore, m one embodiment of the invention, the 

computers 108. Optionally, the server computer 110 and the server computer 110 maintains a database 210 that stores 

IR systems 208A-208M, or selected ones thereof, may be metadata for each of the data objects 216A-216N. By 

integrated on a single computer platform. analyzing the metadata in the database 210, the server 210 

Hie IR systems 208A-208M can include one or more 40 can id f n ! if J wor f tha * are no c l in the ™ d *ta object but 

proprietary or commercial search engines, including only by if included in the mdex information for the source data 

way of example: AOL Search located at ob J ect ™? ld [ b \ > relevant creasing the odds that a 

<http:\\search.aol.com\>, ALTAVISTA located at user will find the source data object. 

<http:\\www.altavista.com\>, ASKJEEVES located at 0nc e the electronic document has been indexed by the IR 

<http:\\www.askjeeves.com\>, Direct Hit located at 45 systems 208A-208M, the user 102 (FIG. 1) may supply 

<http:\\www.directhit.com\>, Excite located at search terms to one or more of the IR systems 208A-208M 

<http:\\www.excite.com\>, Hot Bot located at to receive a list of relevant documents. In one embodiment, 

<http:\\www.hotbot.com\>, Inktomi located at one or more of the IR systems 208A-208M contain index 

<http:\\www,inktomi,com\>, MSN Search located at information for documents that are maintained by servers 

<http:Wsearch.msn. com\>, Netscape located at 50 other than the server computer 110. 

<http:\^earch. netscape, com\>, Northern Light located at When the user 102 enters a query using a selected one of 

<http:\\www.northernlight.com\>, and Yahoo located at the IR systems 208A-208M, the query is checked against 

<http:\\www.yahoo.com\>The IR systems 208A-208M can the IR system's index database. The best matches are then 

also include a system licensed for private use and hosted returned to the user 102 as "hits", i.e., possibly relevant 

within an intranet or an extranet. As an example, such an IR 55 electronic documents based upon the search words in the 

system can include Ultraseek licensed by InfoSeek of query. The selected IR system displays for each of the hits 

Sunny Vale, Calif. at least some of the index information that is associated with 

To publish information regarding a plurality of data each of the hits and an address, e.g., URL, of the hits. In one 

objects 216A-216N, the server computer 110 associates embodiment of the invention, the displayed addresses of the 

each of the data objects 216A-216N with a selected URL, 60 identified electronic document are selectable by using one or 

and then the server computer 110 notifies the IR systems more input devices, such as a mouse. By selecting an 

208A-208Mof each of the selected URLs. For convenience address, the browser 120 automatically requests an elec- 

of description, the data object that is associated with a Ironic document from the selected address, 

selected URL is referred to below as the "source data Upon receiving the request, the server computer 110 

object." 65 determines whether the requester is the client computer 115 

Selected ones of the IR systems 208A-208M use a or one of the IR systems 208A-208M If the request is from 

software program called a "spider" (not shown) to survey one of the IR systems, as discussed above, the server 
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computer 110 dynamically generates an electronic document 
that includes the index information for the source data object 
of the network request. 

However if the server computer 110 determines that the 
requester is the client computer 115, the server computer 110 5 
determines whether the client computer 115 is authorized to 
access the source data object. If the client computer 115 is 
authorized to access the source data object, the server 
computer 110 transmits the source data object to the client 
computer 115. However, if the client computer 115 is not to 
authorized to access the source data object, the server 
computer 110 generates an electronic document that informs 
the user of which steps the user must perform to obtain 
access to the source data object. 

The electronic request from the client computer 115 can 15 
correspond to one of any number of network protocols. In 
one embodiment of the invention, the electronic request 
comprises a Hypertext Transfer Protocol (HTTP) request. 
However, it is to be appreciated that other types of network 
communication protocols may be used. 20 

HTTP allows the client 115, the server computer 110, and 
IR systems 208A-208M to communicate with each other. 
HTTP defines how messages are formatted and transmitted, 
and what actions the server computer 110, the client com- 
puter 115, and the IR systems 208A-208M should take in 25 
response to various commands. According to HTTP, the 
client computer 115 can request a network resource from the 
server computer 110. For example, when a URL is selected 
from in the browser 120 (FIG. 1), the browser 120 sends an 
GET command to the server that is hosting the URL, 30 
directing the server to fetch and transmit the electronic 
resources that are associated with the URL. 

It is noted that all HTTP transactions follow the same 
general format. Each client request and server response has ^ 
three parts: a request or response line, a header section, and 
the entity body. The client initiates a transaction as follows. 
First, the client computer sends a document request by 
specifying an HTTP command called a "method", e.g., GET, 
POST, followed by a resource address, and an HTTP version 

40 

number. Next, the client sends optional header information 
to inform the server of its configuration and the document 
formats it will accept. The header information can include 
the name and version number as well as specifying resource 
preferences. For example, and exemplary GET transaction is 
as follows: 

GET/index.html HTTP/1 .0 

Connection: Keep -Alive 

User-Agent: Mozilla/2.02Gold (WinNT; I) 

Host: www.MediaDNA.com 50 

Accept: image/gif, image/x-xbitmap, image/jpeg, image/ 
Pjpeg, */* 

It is noted that the "User-Agent" portion of the GET 
transaction describes the name or identifier of the requester. 
The body portion of a GET transaction is typically empty. 55 
According to the present invention, in response to a HTTP 
request for an electronic resource that is associated a 
selected UTRL, the server computer 110 transmits an elec- 
tronic document having index or other descriptive informa- 
tion regarding the source data object that is associated with 60 
the request, or, alternatively, one of the source data object 
itself, depending on the identity and authorization of the 
requester. 

In one embodiment of the invention, the electronic docu- 
ment includes a header and a body. The header and the body 65 
for the electronic document are dynamically created and 
customized in response to an electronic request for an 
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electronic resource by the client computer 115 and/or one of 
the IR systems 208A-208M. The header describes proper- 
ties of the document such as title, document toolbar, scripts 
and meta information. The body defines the page that is 
displayed to the user once the electronic document is 
received by the requester. 

For example, assuming the electronic document is an 
HTML document, the header can include the following 
elements: BASE, LINK, META, and TITLE. The BASE 
element defines an absolute URL that resolves relative URLs 
within the document. The LINK element defines relation- 
ships between the document and other documents. The 
LINK element can be used to create tool bars, link to a style 
sheet, a script, or a printable version of the document and 
embed authorship details. The META element includes 
information about the document not defined by other ele- 
ments. The META element supplies generic meta informa- 
tion using name/value pairs. The TITLE element is dis- 
played in the window title. As is discussed in further detail 
below, the server computer 110, depending on the 
embodiment, customizes one or more elements of the header 
and body. 

The data objects 216A-216N can be of any arbitrary 
format and can contain any type of data. For example, the 
data objects 216A-216N can include: an electronic docu- 
ment according to any open or proprietary format, e.g., 
HTML, PDF, PostScript, rich text format, structured data- 
base formats, SGML, TeX, TrueType, XHTML, XML, XSL, 
Cascading Style Sheets, LaTeX, MuTeX, ASCII, EBCDIC, 
AVI. Furthermore, for example, the content of the data 
objects 216A-216N can include: a music file, e.g., MP3: or 
MIDI, a multimedia file, a streaming media file, a bitmap 
image, configuration files, account information, an execut- 
able image, or a digital rights management (DRM) object. 

FIG. 3 is a block diagram illustrating one embodiment of 
the server computer 110 (FIG. 1). The server computer 110 
includes a number of modules to prepare a response to 
request, from either the client computer 115 or one of the IR 
systems 208A-208M, for one of the electronic resources 
that is maintained by the server computer 110. 

In one embodiment of the invention, the server computer 
110 includes a main engine 204 which maintains control 
over the processes within the server computer 110. The main 
engine 204 is in communication with a number of modules 
including a server interface module 218, an obfuscator 
module 220, a document generator module 222, an IR 
system database 224, format templates module 226, a user 
database 228, a thesaurus module 232, a stem word extractor 
module 236, a semantic network module 240, a pattern 
recognition module 245 being able to generate machine 
readable tokens that represent patterns in audiovisual data 
objects, and a keyword extractor module 244. 

As can be appreciated by one of ordinary skill in the art, 
each of the foregoing modules may comprise various sub- 
routines, procedures, definitional statements, and macros. 
Each of the foregoing modules are typically separately 
compiled and linked into a single executable program. 
Therefore, the following description of each of the foregoing 
modules is used for convenience to describe the function- 
ality of the server computer 110. Thus, the processes that are 
undergone by selected ones of the modules may be arbi- 
trarily redistributed to one of the other modules, combined 
together in a single module, made available in a shareable 
dynamic link library, or partitioned in any other logical way. 

The foregoing modules may be written in any program- 
ming language such as C, C++, BASIC, Pascal, Java, and 
FORTRAN and ran under the well-known operating system. 
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C, C++, BASIC, Pascal, Java, and FORTRAN are industry 
standard programming languages for which many commer- 
cial compilers can be used to create executable code. 

The server interface module 218 is responsible for ini- 
tially receiving a network request from the client computer 5 
115 and/or the IR systems 208A-208M and forwarding the 
request to the main engine 204. The document generator 
module 222 is responsible for dynamically generating an 
electronic document that comprises the index information 
for a respective one of the data objects 216A-216N. The 10 
obfuscator module 220 obfuscates the contents of selected 
ones of the data objects 216A-216N in response to a request 
from the main engine 204. The format templates module 226 
maintains a plurality of templates that defme the layout of 
one or more of the data objects 216A-216N. 15 

The IR system database 224 maintains the indexing 
characteristics of one or more IR systems. For example, the 
IR system database 224 includes information as to whether 
an IR system performs stemming, recognizes the case of 
keywords, recognizes duplicative words, and the number of 20 
words that are used by the IR system when indexing the 
electronic resource. In one embodiment of the invention, the 
indexing characteristics of the IR system is manually entered 
into the IR system database 224 via a system administrator 
at the server computer 110 in response to prompts by the 25 
server computer 110. In another embodiment of the 
invention, each of the IR systems automatically provide 
their indexing characteristic information based upon a 
request for such information. In yet another embodiment of 
the invention, each of the IR systems provide their indexing 30 
characteristic as part of the request for an electronic resource 
that is maintained by the server computer 110. 

The user database 228 stores information regarding each 
of the users that have requested access to one of the data 
objects 216A-216N and/or have a license to access the data 35 
objects 216A-216N. One embodiment of the user database 
228 is described in further detail below with respect to FIG. 
4. 

The thesaurus module 232 defines for selected index 
words, a set of other related index words. Furthermore, the 40 
semantic network module 240 analyzes each of the data 
objects 216A-216N for their semantic meaning. The server 
computer 110 may optionally insert one or more index 
words that are provided by the thesaurus module 232 and/or 
the semantic network module 240 into the index information 45 
of the source data object. 

The keyword extractor module 244 prepares an initial set 
of index words based upon the contents a selected one of the 
data objects 216A-216N. The keyword extractor module 
244 determines whether any index information has already 50 
been prepared for the selected data object, or, alternatively, 
dynamically generates the index information for the selected 
data object. For example, if the selected data object is a 
music file, the keyword extractor module 244 can determine 
whether any index information is currently associated with 55 
the music file and/or scan the music to identify any words 
that are within the music. Furthermore, for example, if the 
selected data object is a bitmap image, the pattern recogni- 
tion module 245 (FIG. 3) can use optical character recog- 
nition (OCR) software so as to identify any words that are 60 
used within the bitmap image and use those identified words 
as the index information for the bitmap image. 

The main engine 204 also is connected to a stem list 238, 
a hit list 250, a drop list 260, a case list 264, and a stop list 
268. The stem list 238 describes for one or more index 65 
words, a corresponding stem of the word. The server com- 
puter 110 may optionally reduce the overall size of the index 
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information by substituting a stem of an index word for the 
index word. In one embodiment of the invention, the server 
computer 110 removes selected prefixes and/or suffixes from 
the index words to create the stemmed words. For additional 
reference, information regarding stemming can be found in 
M. F. Porter, An Algorithm for Suffix Stripping, in Reading 
in Information Retrieval (Morgan Kaufinann, 1997). 

The hit list 250 contains a list of words that are commonly 
used by users when searching the IR systems. In one 
embodiment of the invention, the hit list 250 is generated 
over time. In this embodiment, in each request for an 
electronic document, the client computer 115 provides to the 
server computer 110 a list of the keywords that were used by 
the user 102 when the user 102 searched for the source data 
object via one of the IR systems 208A-208M. For example, 
assuming the request is a HTML request which was prepared 
in response to a user selecting a "hit" that was displayed by 
one of the IR systems, the browser 120 automatically 
includes in the request the search terms that were used by the 
user 102 in generating the hit. The server computer 110 
accumulates and analyzes the keywords thereby identifying 
popular keywords which are used by users when searching 
for the data objects 216A-216N. 

Furthermore, in yet another embodiment of the invention, 
group hit lists (not shown) are maintained for groups of the 
data objects 112, each of the group hit lists describing 
popular words that were used by users to locate documents 
within the respective group. 

The drop list 260 includes a list of search words that are 
infrequently or never used by users when users search for the 
data objects 216A-216N via the IR systems 208A-208M. 
The server computer 110 may optionally remove one or 
more of the words from the index information for a selected 
data object if the words are found in the drop list 260. 

The case list 264 includes a list of search words that have 
more than one associated spelling using different cases, e.g., 
IBM, ibm. If the requesting IR system is case sensitive, the 
server computer 110 can optionally add one or more words 
from the case list to the index information for the source data 
object. 

The stop list 268 includes a list of stop words which are 
removed from the index information for the source data 
object. The stop words are those words that should not be 
included in the index information because: (i) the words 
have special meaning to the IR system since they are part of 
a search grammar, (ii) the words occur so often that the 
words are considered to be of little relevance, and/or (iii) the 
provider of the data objects 216A-216N has decided to 
remove the words from the index information for personal or 
business reasons, such as privacy. FIG. 18 illustrates the 
contents of an exemplary stop list 268. 

FIG. 4 is a high-level block diagram illustrating in further 
detail some of the data items that are stored in the user 
database 228. In one embodiment of the invention, a record 
,308 is maintained for each of the users. The record 308 
includes control rights 31 2, ah istory log 3 16, and a user 
%omej2p. The control rights 3l2 specify the rights of the 
user with respect to one or more of the data objects 
216A-216N. In one embodiment if the invention, the control 
rights 312 specify the rights of the user with respect to a 
group of the data objects 216A-216N. \^ 

The control rights 312 can include various items, such as: 
the right to print, copy, view, edit, execute, delete, and merge 
with another data object. Further, the control rights 312 can 
also specify a number of uses with respect to each of the 
control rights. For example, the control rights can specify 
that the user is allowed to print a selected one of the data 
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objects such as data object 216B five times. In another 
embodiment of the invention, the control rights 312 may be 
applied to a group or all of the users. In another embodiment 
of the invention, the control rights may be integrated with 
one or more of the data objects 216A-216N. s 

,Jhe history log 316 maintains a transaction history of 
each or the data objects 216A-216N that have been 
requested by fhe user, as well as those search terms which 
w ere usefl the user to identify the data objects 
2T6A-216N. In one embodiment of the invention, the his- 30 
tory logs of each of the users are consolidated into a master 
history log 324. 

The user profile 320 includes information regarding the 
personal preferences of the user. For example, the user 
profile 320 can include one or more templates that are 15 
preferred by the user when viewing the data objects. 
Additionally, the user profile 320 can include a national 
language that is preferred by the user, e.g., English, German, 
French, Swedish. 

20 

OPERATION FLOW 

FIG. 5 is a high-level flowchart illustrating a process for 
generating an electronic document. After starting at a state 
400, the process flow moves to a state 404, wherein a 
requester requests an electronic document that is associated 25 
with a specified URL. In one embodiment of the invention, 
the network request for the electronic document is an HTTP 
request for an document that is associated with a selected 
URL. 

After receiving the network request from either the user 
client computer 115 or one of the IR systems 208A-208M, 
the process proceeds to a state 408 wherein the server 
computer 110 dynamically generates an electronic document 
that provides index or other descriptive information regard- 35 
ing the source data object that is associated with the request, 
or, alternatively, retrieves the data object that is associated 
with the specified URL. 

The process for providing an electronic document or data 
object is described in further detail below with respect to 40 
FIG. 6. However, in brief, the process is as follows. If the 
server computer 110 determines that the requester is autho- 
rized to access the data object that is associated with the 
specified URL, the server computer 110 transmits the source 
data object that is associated with he request. However, if the 45 
requester is not authorized to access the data object, the 
server computer 110 generates a customized electronic 
document based upon whether the requester is one of the IR 
systems 208A-208M (FIG. 2) or other type of user, such as 
the client computer U5 (FIG. 2). If the requester is one of 50 
the IR systems 208A-208M, the server computer 110 gen- 
erates an electronic document that includes the index infor- 
mation for the source data object. 

If the requester is the client 115, the server computer 110 
generates an electronic document that describes for the user 55 
the steps that the user must perform to obtain access to the 
source data object. After completing state 408, the process 
flow moves to an end state 412 wherein the server computer 
110 waits for further document requests from the network 
116. 60 

FIG. 6 is a flowchart illustrating in further detail one 
embodiment of a process for providing a response to a 
request for an electronic resource that is maintained by the 
server computer 110. FIG. 6 illustrates in further detail the 
acts that occur within state 408 of FIG. 5. It is noted that, 65 
depending on the embodiment, selected steps of FIG. 6 may 
be omitted and that other steps may be added. 
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After starting at a start state 504, the process flow pro- 
ceeds to a decision state 506. At the decision state 506, the 
server computer 110 determines whether the requester of the 
data object is one of the IR systems 208A-208M or, 
alternatively, the client computer 115. To determine the 
identity of the requester, the server computer 110 analyzes 
the electronic request (received in state 404 of FIG. 5) for a 
requester identifier. The request identifier can be a unique 
value or a digital signature that is associated with the 
requester. 

If the server computer 110 determines that the requester is 
an IR system, the server computer 110 proceeds to a state 
508 wherein the server computer 110 (FIG. 2) determines 
whether all or selected portions of the source data object that 
is associated with the request should be converted into index 
information. If the server computer 110 determines that 
selected portions of the data object should be converted into 
machine readable text, the server computer 110 proceeds to 
a state 512. 

At the state 512, the server computer 110 converts all or 
selected portions of the source data object that is associated 
with the request into machine readable characters, that will 
collectively comprise an initial set of index information for 
source data object. For example, if the source data object 
comprises a music file, the server computer U0 may parse 
the music file to identify any words that are included within 
the lyrics of the music. As another example, if the source 
data object is a bitmap image, the server computer 110 may 
employ character recognition to identify one or more textual 
elements within the bitmap image using optical character 
recognition software. Furthermore, if the source data object 
is a multimedia and/or a streaming media file, the server 
computer 110 may read and store any close captioned 
information that is associated with the file, or alternatively, 
employ one or more the above-described conversion tech- 
niques. Furthermore, if the source data object comprises text 
of another language, the server computer 110 can convert all 
or selected portions of the source data object into another 
language, such as English. 

In one embodiment of the invention, the server computer 
110 maintains a list which describes one or more conversion 
processes to be employed with respect to the source data 
object. In another embodiment of the invention, the conver- 
sion information is predefined and stored within the source 
data object or at another known location. 

If at the decision state 508 the server 110 determines not 
to convert the source data object, or, alternatively, after 
completing the state 512, the process proceeds to a state 514. 
At the state 514, the server computer 110 selects the index 
information for the source document. The index information 
can include the selected textual portions of the source data 
object, such as was converted at state 512, or alternatively, 
portions of the source data object that is already in textual 
form. In one embodiment of the invention, the server 
computer 110 comprises predefined index information that is 
associated with the source data object. The predefined index 
information can be stored in one of several locations, 
including: a file on the server computer 110, a predefined 
section of the source data object, a predefined location on a 
remote computer, or a location on the network that is 
identified by the source data object. 

Continuing to a decision state 516, the server computer 
110 (FIG. 1) determines whether to create multiple elec- 
tronic documents based upon the index information for the 
source data object. The provider of the data object may 
desire to export multiple electronic documents of index 
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information, each of the electronic documents being directed 
to a selected portion of the data object. If the server 
computer 110 determines that multiple documents are to be 
created, the server computer 110 proceeds to a state 512. 

At the state 518, the server computer 110 (FIG. 1) 5 
partitions the index information into two or more sections. 
In one embodiment of the invention, the source data object 
includes its partition information. In another embodiment of 
the invention, the server computer 110 dynamically analyzes 
the source data object so as to identify one or more parti- 50 
tions. For example, if the source data object comprises a 
number of songs, the server computer 110 can partition the 
source data object based upon each of the songs. 
Furthermore, for example, with reference to FIG. 8, if the 
source data object comprises an electronic book 600, the 15 
server computer 110 can partition the source data object into 
one or more sections 604, each of the sections being based 
upon one of the chapters of the book. To facilitate traversal 
the web documents by a spider, the server computer 110 may 
optionally include in the body of each of the electronic 20 
documents a link to one or more of the other partitions. 

If at the decision state 516, the server computer 110 
determines not to create multiple documents of index 
information, or, alternatively, after completing state 518, 
process flow proceeds to a decision state 520. At the state 25 
520, the server computer 110 determines whether to obfus- 
cate the index information. In one embodiment of the 
invention, each of the data objects 216A-216N (FIG. 1) may 
designate whether the index information should be obfus- 
cated. In another embodiment of the invention, a flag 30 
indicating whether the data object should be obfuscated is 
stored in a predefined location, such as on the server or 
another computer that is connected to the server via the 
network 116 (FIG. 2). 

If the server computer 110 (FIG. 2) decides to obfuscate 
the index information, the server computer 110 proceeds to 
a state 528. At the state 528, the server computer 110 
obfuscates the index information. The obfuscation process is 
described in further detail below with reference to FIG. 10. 4Q 
However, in brief, the obfuscation process modifies the 
index information such that if the index information was 
viewed by a user, the user would not be able to easily 
reconstruct the original content of the source data object. 

Referring again to the decision state 520, if the index 45 
information is already obfuscated or if obfuscation is not 
desired, or, after completion of the state 528, the server 
computer 110 proceeds to a state 532. At the state 532, the 
server computer 110 dynamically generates a header and 
body for an electronic document using the prepared index 50 
information. The process for dynamically generating the 
electronic document is described in further detail below with 
reference to FIG. 11. 

The server computer 110 then proceeds to an end state 536 
waiting for additional electronic resource requests. Once the 55 
request is received, the process flow starts again at the state 
400 (FIG. 5). 

Referring again to the decision state 506, if the server 
computer 110 (FIG. 1) determines that the requester is the 
user 102 (FIG. 1), the server computer 110 proceeds to a 60 
decision state 540. At the decision state 540, the server 
computer 110 determines whether the user 102 is authorized 
to access the source data object that is associated with the 
requested electronic resource. In one embodiment of the 
invention, the server computer 110 identifies the identity of 65 
the user by examining the user information that was pro- 
vided by the client computer 115 as part of the request for 
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the electronic resources. For example, in a HTTP request, 
user authentication can be performed using HTTP 
Authentication, e.g. RFC 2617 as is described at <http:// 
www.ietf org/rfc/rfc2617.txt>. The server computer 110 may 
also optionally display an authorization screen wherein the 
user 102 is requested to provide identifying information, 
password, or digital signature. Upon identifying the identity 
of the user 102, the server computer 110 examines the 
control rights 312 (FIG. 3) that are associated with the user 
to determine the access rights of the user 102. In another 
embodiment of the invention, the server computer 110 
displays a description of the source data object and a 
hyperlink to an authentication server (not shown). If the user 
selects the hyperlink, the authentication server determines 
whether the user is allowed access to the source data object. 

If the server computer 110 (FIG. 1) determines that the 
user 102 is authorized to access the data object, the server 
computer 10 proceeds to a state 544. At the state 544, the 
server 110 checks the format templates module 266 to see if 
the source data object has an associated format template. If 
the source data object has an associated format template, the 
server computer 110 formats the source data object accord- 
ing to the specifications of the associated format template. 
The server 110 then transmits the source data object to the 
client computer 115. If the source data object is a streaming 
media file, the server computer 110 streams the content of 
the data object to the client computer 115 (FIG. 1). 

Continuing to a state 548, the server computer 110 stores 
one or more items of user information. For example, the user 
information can include: the name of the user 102, an 
identifier that is associated with the user, the time the data 
object was transmitted to the user, and one or more search 
words that were used by the user 102 to locate the electronic 
resource. Next, the server computer 110 moves to the end 
state 536 and waits for additional electronic resource 
requests. 

Referring again to the decision state 540, the if the server 
computer 110 (FIG. 1) determines that the user 102 (FIG. 1) 
is not authorized to access the source data object, the server 
computer 110 proceeds to a state 700 (FIG, 7) via off page 
connector "A." At the state 700, the server computer 110 
generates an electronic document that will describe to the 
user 102 what steps the user 102 should take to become 
authorized to access the source data object. At the state 700, 
the server computer 110 generates a header and body for the 
electronic document. 

With respect to FIG. 9, an illustrative electronic document 
900 is shown that includes a brief description 904 of the 
source data object, payment information 908 for the source 
data object, and an acceptance selector 916. The acceptance 
selection is an icon, such as a button, whereby selecting the 
user can indicate approval and acceptance of the conditions 
of the payment information 908. 

Continuing to a decision state 704, the server computer 
110 determines whether the user 102 agrees to the conditions 
of access that were specified in the electronic document 
(prepared in state 700). If the user 102 (FIG. 1) agrees to the 
access conditions, the server computer 110 proceeds to the 
state 544 (FIG. 6) via off page connector "B." State 544 is 
described in further detail above. However, if the user 102 
does not agree to the access condition, the server computer 
110 proceeds to the slate 548 (FIG. 6) via off page connector 
"C." State 548 is described in further detail above. 

It is noted that in one embodiment of the invention, one 
or more of the states shown in FIGS. 6 and 7 can occur in 
a pre-processing stage prior to receiving requests for the 
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electronic resource from the client computer 115 or one of 
the IR systems 208A-208M. For example, data object 
conversion (state 512), index information partitioning (state 
520), index information obfuscation (state 528), generation 
of electronic documents (states 532 and 700) can occur, if 5 
desired, prior to receiving a request for one of the data 
objects 216A-216N. 

FIG. 10 is a high level flowchart illustrating a process of 
obfuscating index information. FIG. 10 illustrates in further 
detail the state 528 of FIG. 6. In one embodiment of the 30 
invention, prior to traversing the states of FIG. 10, the server 
computer 110 has received a request for an electronic 
resource at a selected URL. Furthermore, the server com- 
puter 110 has identified a source data object that is associ- 
ated with the selected URL, and the server computer 110 has 15 
prepared a putative set of index information for the source 
data object. The putative set of index information may have 
come from one of the data objects 216A-216N, an indexing 
file that is associated with the source data object, or some 
other source. The obfuscating process transforms the index 20 
information in such a way as to obscure or confuse the 
meaning of the information without interfering with the 
ability of an IR system to properly index and retrieve the 
electronic document. 

After starting at a state 1000, the server computer 110 25 
(FIG. 1) proceeds to a state 1004 wherein the server com- 
puter 110 parses the content of the index information. At the 
state 1004, the server computer 110 "tokenizes" via a 
tokenizer each of the words in the index information. 
Tokenizing refers to separating the index information into 30 
groups of words, "tokens," based upon a delimiter which 
depends upon the indexing characteristics of the requesting 
IR system. The delimiter can include white space, e.g., a 
space, a carriage return, or a tab, or, alternatively, can be a 
word from the stop list 268 (FIG. 2). If the requesting IR 35 
system recognizes phrases (as indicated by the information 
retrieval database 224), the server computer 110 parses the 
index information based upon the words in the stop list 268, 
thereby creating a plurality of tokens, each of the tokens 
having one or more words. Otherwise, if the requesting IR 40 
system does not recognize phrases, the server computer 110 
parses the index information based upon white space that is 
within the index information. 

Continuing to a state 1008, the server computer 110 45 
removes selected tokens from the index information. In one 
embodiment of the invention, the server computer 110 
removes from the index information each of the tokens that 
are listed within the stop list 268. 

For example, FIG. 13 illustrates an exemplary data object 50 
1300, wherein the data object comprises an HTML docu- 
ment. Assuming that the contents of the exemplary data 
object 1300 comprised the putative set of index information, 
after completing the state 1008, as is shown in FIG. 13, the 
server computer 110 has removed one or more of the tokens 55 
that are listed within the stop list 268. FIG. 14 illustrates an 
exemplary set of tokens that remain after the server com- 
puter 110 has removed selected tokens from the exemplary 
data shown in FIG. 13. 

Moving to a state 1012 (FIG. 10), the server computer 60 
1100 may optionally insert one or more selected tokens into 
the index information. In one embodiment of the invention, 
the server computer 110 replaces one or more of the tokens 
that were discarded in state 1008 with a randomly selected 
token from the stop list 268. The server computer 110 may 65 
optionally elect to insert random tokens from the stop list 
268 even though no words were discarded from step 1008. 
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Continuing the example from above, FIG. 15 illustrates the 
contents of the index information shown in FIG. 14 after 
selected tokens have been added to the index information. 

Next, at a state 1016, the server computer 110 optionally 
randomizes via a randomizer the order of each of adjacent 
tokens. The tokens are randomized by selecting a predeter- 
mined number of tokens from the output of the previous 
steps (in the order they were parsed), and then randomizing 
the order of those tokens. The number of tokens that is 
gathered in each pass is known as the randomness factor. 
The greater the value of the randomness, the greater is the 
impact on IR systems that evaluate the proximity of words. 
If the server computer 110 uses a stop list 268 that has a large 
number of tokens, the index information may be adequately 
obfuscated by the removal of the words that are in the stop 
list 268 and the randomization step may be omitted. 

Still referring to the state 1016, in another embodiment of 
the invention, the order of the tokens is reversed via a token 
order reverser. If the order of the tokens is reversed, the 
index information will be slighdy more obfuscated that 
otherwise; however this reversal may reduce the recall and 
precision of IR systems that consider word order. FIG. 16 
illustrates the contents of the index information after the 
contents of the index information shown in FIG. 15 has been 
randomized. Next, at a state 1020, the obfuscation process 
ends. 

FIGS. 11 and 12 are collectively a flowchart illustrating a 
process of dynamically customizing the index information 
for the source data object. FIGS. 11 and 12 further illustrate 
the states that are within state 532 of FIG. 6. In one 
embodiment, prior to entering the states shown in FIGS. 11 
and 12, the server computer 110 has determined that it has 
received a request for an electronic resource at a selected 
URL from one of the IR systems 208A-208M. In another 
embodiment, the server computer 110 is preprocessing a 
selected data object and, is customizing the index informa- 
tion in preparation of a future request. Furthermore, the 
server computer 110 has prepared a putative set of index 
information that may optionally be obfuscated by the pro- 
cess shown in FIG. 10. 

After starting at a start state 1100, the server computer 110 
(FIG. 1) proceeds Ad to a state 1104. At the state 1104, the 
server computer 110 (FIG. 1) dynamically generates an 
initial header and body for the requested electronic docu- 
ment based upon the contents of the putative set of index 
information. In one embodiment of the invention, the header 
and the body of the electronic document comprises each of 
the words in the putative set of index information. For 
example, assuming the electronic document is an HaFML 
document, the server computer 110 can insert each of the 
words in the putative set of index information into the 
keywords section of the header. The server computer 110 
inserts the command <META Name="keywords" Content - 
"Key Word List">, wherein Key Word List is a list of each 
of the words, into the header portion of the electronic 
document. Furthermore, the server computer 110 can option- 
ally insert one or more words in the "description" section of 
the header. In HTML, the description metatag allows IR 
systems to display an intelligible excerpt regarding the 
content of the document beneath the title of the electronic 
document. The server computer 110 may optionally insert 
one or more words from the putative set of index informa- 
tion and/or a description that is associated with the data 
object in the body of the electronic document. Optionally, 
depending on the indexing characteristics of the requesting 
IR System, if index information is to be included in the body 
of the electronic document, the server computer 110 can set 
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the font of the text within the body portion to be displayed words. The selected classification can include duplicative 

using a white font on and white background to provide a words, adjectives, adverbs, nouns, pronouns, or verbs. In 

more user-friendly display to the electronic document. one embodiment of the invention, the determination whether 

However, if the requesting IR system ignores text having a to remove a selected classification of words is based upon 

font color that is the same as the background, the server s the indexing characteristics of the requesting IR system. In 

computer 110 does not employ this technique. another embodiment of the invention, the determination 

Moving to a decision state 1112, the server computer 110 whether to remove a selected classification of words is based 

determines whether to perform "stemming" with respect to u P on tnc preference of the provider of the source data object, 

the index information. Stemming refers to the process of 11 ^ noled tQat more than one classification of words may be 

truncating one or more of the words comprise the index 30 removed. 

informnation. In one embodiment of the invention, the For example, if the requesting IR system does not place 

determination of whether to perform stemming is based additional weight on index words that are duplicative, the 

upon the indexing characteristics of the requesting IR sys- server computer 110 can decide to remove the duplicative 

tern. It is noted that for some electronic document formats, word to make space in the index information for other non 

the header portion of the electronic documents can only 15 duplicative words. Furthermore, for example, the server 

store a selected amount of characters. Furthermore, some IR computer 110 can remove adjectives from the index infor- 

sys terns only analyze a selected portion of the header, e.g., mation to increase the obfuscation of the index information 

the first 100 characters in the index information portion of and to also increase space in the index information for other 

the header. For these electronic document formats and IR potentially more meaningful index information, 

systems, the server computer 110 advantageously attempts 20 [f the server computer 110 determines to remove a 

to maximize the number of index words that are included selected classification of words, the server computer 110 

within the header. By stemming one or more of the index proceeds to a state 1140. At the state 1140, the server 

words that are within the header, the server computer 110 computer 110 removes the selected classification of words 

reduces the total character count of the index words, thereby f rom the index information 

lea^ngspaceforoneorrnoreindexwordstobeaddedtothe 25 Re to ^ ^ . f ^ ^ 

header of the electronic document. compmer no ^ 1} determines not t0 remove a cW 

If the server computer 110 (FIG. 1) determines to perform fiction of words, or, alternatively, after completing the state 

stemming, the server computer 110 proceeds to a state 1116. 1140, the server computer 110 proceeds to a decision state 

At the state 1116, the server computer 110 stems the words i 144 . At the decision state 1144, the server computer 110 

in the index information. In one embodiment of the determines whether to add one or more words to the elec- 

invention, the server computer 110 substitutes one or more tromc document that are common to a group of documents, 

words from the index information with a corresponding The server computer 110 may determine that even though a 

word from the stem list 238. In another embodiment of the wor d was not one of the words of the source data object (and 

invention, the server computer 110 removes selected pre- therefor not one of current index words in the electronic 

fixes and/or suffixes from the index words to create the document), the word should be added since it is found in one 

stemmed words. or more data objects that are related to the source data object. 

Referring again to the decision state 1112, if the server If the server computer 110 determines to add one or more of 

computer 110 (FIG. 1) determines not to perform stemming, the common words, the server computer 110 proceeds to a 

or, alternatively, from the state 1116, the server computer 4Q state 1148. At the state 1148, the server computer 110 inserts 

110 proceeds to a decision state 1120. At the state 1120, the one or more of the common words into the electronic 

server computer 110 determines whether to insert one or document. 

more words into the header and/or body of the electronic Referring again to the decision state 1144, if the server 

document using words from the case list 264. In one computer 110 (FIG. 1) determines not to add common words 

embodiment of the invention, the determination whether to ^ t0 the electronic document, or alternatively, after completing 

insert one or more words from the case list 264 is based upon lhc statc the server computer 110 proceeds to a state 

the indexing characteristics of the requesting IR system. 12 08 (FIG. 12) via off page connector "D." At the state 1208, 

If the server computer 110 determines to add or more the server computer 110 determines whether to add one or 

words from the case list 264, the server computer 110 more words from the thesaurus module 232 (FIG, 3). 

proceeds to a state 1124. At the state 1124, the server 50 If ^ ^ computer 110 determines to add or more 

computer 110 reads the case list 264. Continuing to a words from the thesaurus 232, the server computer 110 

decision state 1128, the server computer 110 determines proceeds to a state 1212. At the state 1212 the server 

whether one or more words in the case list 264 are also compu ter 110 identifies one or more words from the the- 

included within the electronic document. If the server com- saurus 232 that have a similar meaning to one or more of the 

puter 110 identifies one or more words in the case list 264 55 index words int0 the electronic document. In one embodi- 

that are also m the electronic document, the server computer ment of the i nvent ion, the server computer 110 checks the 

110 proceeds to a state 1132. At the state 1132, the server thesaurus module 232 for each of the words that are within 

inserts one or more words from the case list 264 into the the electronic document. In another embodiment of the 

electronic document. invention, the server computer 110 only checks the thesau- 

If at the decision state 1120 the server computer 110 60 rus module 232 for words that are found multiple times 

determines not to add or more words from the case list 264, within the index information. In yet another embodiment of 

or, if at the decision state 1128 no words were identified in the invention, the server computer 110 only checks the 

the electronic document that were in the case list 264, or thesaurus module 232 for the words that were added in the 

after completing the state 1132, the server computer 110 state 1148. In yet another embodiment of the invention, the 

proceeds to a decision state 1136 . 65 serve r computer 110 checks the thesaurus module 232 for 

At the decision state 1136, the server computer 110 those words that were removed at the state 1140. After 

determines whether to remove a selected classification of identifing one or more related words via the thesaurus 
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module 232, the server 110 inserts the identified words into 
the electronic document. 

If the server computer 110 (FIG. 1) determines not to add 
or more words from the thesaurus module 232, or 
alternatively, after completing the state 1212, the server 5 
computer 110 proceeds to a decision state 1216. At the 
decision state 1216, the server computer 110 determines 
whether to add or more words from any hit lists, such as the 
hit list 250 (FIG. 3), that may be associated with the data 
object. The server computer 110 can determine whether to to 
apply a hit list on a data object-by-data object basis, or 
alternatively, on a group-by-group of data objects basis. 

If the server computer 110 determines to add one or more 
words from the hit list 250, the server computer 110 pro- 
ceeds to a state 1218. At the state 1218, the server computer 15 
110 adds one or more words from the hit list 250. 

Referring again to the decision state 1216, if the server 
computer 110 determines not to add words from the hit, or 
alternatively, after completing the state 1218, the server 
computer 110 proceeds to a decision state 1220. 20 

At the decision state 1220, the server computer 110 
determines whether to remove one or more words from the 
index information that are identified by the drop list 260 
(FIG. 3). If the server computer 110 determines to remove 25 
one or more words from the drop list 260, the server 
computer proceeds to a state 1224. At the state 1224, the 
server computer 110 removes one or more words from the 
index information that are found in the drop list. 

Referring again to the decision state 1220, if the server 30 
computer 110 (FIG. 1) determines not to remove one or 
words from the drop list 260, or, alternatively, after com- 
pleting the state 1224, the server proceeds to a decision state 
1228. At the state 1228, the server computer 110 determines 
whether the semantic network module 220 (FIG. 3) is 35 
enabled. If the semantic network module 220 is enabled, the 
server 220 proceeds to a state 1232 and adds one or more 
words that have been identified by the semantic network to 
the index information. 

Referring again to the decision state 1228, if the semantic 40 
network module 220 (FIG. 3) is not enabled, or, 
alternatively, after completing state 1232, the server com- 
puter 110 (FIG. 1) proceeds to a state 1236. At the decision 
state 1236, if the number of words in the index information 
is greater than the number of words that are used by the 45 
requesting IR system, the server computer 110 applies a 
selection function to remove one or more words from the 
index information. In one embodiment of the invention, the 
server computer 110 prioritizes and maintains in the index 
those words that occur with a high frequency in a high 5 0 
number of documents. It is noted that the selection function 
of state 1236 may optionally be applied after the server 
computer 110 executes after any of the states 1116, 1132, 
1140, 1148, 1212, 1218, or 1224. Continuing to an end state 
1244, the server computer 110 proceeds to an end state 1248. S5 

The present system provides a cost effective solution to 
providing index information to IR systems. The system does 
not require any changes on the part of the IR system 
providers. DRM-protected data objects can be used with the 
IR systems as if the DRM-protected data objects are not 60 
rights-protected at all. The system permits seamless, nearly 
transparent, and immediate support for searching of DRM- 
protected data objects, while allowing the DRM software to 
remain in exclusive control over the DRM data objects. 

Furthermore, one embodiment of the present invention 65 
(FIG. 1) reduces the overhead that is associated with main- 
taining index information for various heterogeneous IR 
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systems. The server computer 110 can generate customized 
index information on the fly based upon the indexing 
characteristics of the IR system. Furthermore, if the content 
of the data objects 216A-216N changes, the server com- 
puter 110 can automatically generate new index information 
for the data object. 

While the above detailed description has shown, 
described, and pointed out novel features of the invention as 
applied to various embodiments, it will be understood that 
various omissions, substitutions, and changes in the form 
and details of the device or process illustrated may be made 
by those skilled in the art without departing from the spirit 
of the invention. The scope of the invention is indicated by 
the appended claims rather than by the foregoing descrip- 
tion. All changes which come within the meaning and range 
of equivalency of the claims are to be embraced within their 
scope. 

What is claimed is: 

1. A method of generating index information for a data 
object, the method comprising: 

generating index information for the data object, wherein 
the index information includes a set of one or more 
keywords; 

selecting one or more of the keywords from the index 
information; 

identifying one or more words that are associated with the 

selected keywords; and 
adding the identified words to the set of keywords, the 

identified words providing additional keywords for the 

index information for the data object. 

2. The method of claim 1, wherein identifying the one or 
more words that are associated with the selected keywords 
comprises using a thesaurus to identify one or more words 
that are related to the selected keywords. 

3. The method of claim 1, wherein identifying the one or 
more words that are associated with the selected keywords 
comprises using a semantic network to identify one or more 
words that are related to the selected keywords. 

4. The method of claim 1, additionally comprising obfus- 
cating at least a portion of the index information so that the 
intelligibility of the contents of the index information is 
reduced. 

5. The method of claim 1, wherein the electronic docu- 
ment is dynamically generated in response to a request for 
the network resource that is maintained by a server com- 
puter, 

6. A method of generating index information for a data 
object, the method comprising: 

generating index information for the data object; identi- 
fying one or more words that are common to a group of 
data objects that includes the data object; and 

adding the identified words to the index information. 

7. The method of claim 6, additionally comprising obfus- 
cating at least a portion of the index information so that the 
intelligibility of the contents of the index information is 
reduced. 

8. The method of claim 6, wherein the index information 
is dynamically generated in response to a request for an 
electronic resource that is maintained by a server computer. 

9. A method of generating index information for a data 
object, the method comprising: 

generating index information for the data object, wherein 
the index information includes a set of one or more 
keywords; 

identifying the roots of selected keywords; 
substituting the selected keywords with the roots. 
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10. The method of claim 9, additionally comprising 
obfuscating at least a portion of the index information so that 
the intelligibility of the contents of the index information is 
reduced. 

11. The method of claim 9, wherein the roots are auto- 5 
matically inserted into the index information upon a request 
for an electronic resource which is maintained by a server 
computer. 

12. A method of generating index information for a data 
object, the method comprising: 10 

generating index information for the data object, wherein 
the index information includes a set of one or more 
keywords; 

classifying one or more of the keywords into one or more 

classifications; 
selecting at least one of the classifications; and 
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removing one or more of the keywords that are members 
of a selected classification of the keywords. 

13. The method of claim 12, wherein classifying the one 
or more keywords comprises identifying whether a respec- 
tive keyword is an adjective. 

14. A method of generating index information for a data 
object, the method comprising: 

generating index information for a data object, wherein 
the index information comprises one or more key- 
words; 

selecting one or more of the keywords; 

identifying one or more keywords that are associated with 

the selected keywords with a semantic network; and 
adding the identified keywords to the index information. 
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